Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres

Identifieur interne : 000011 ( Main/Exploration ); précédent : 000010; suivant : 000012

The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres

Auteurs : Thierry Chanier [France] ; Céline Poudat [France] ; Benoit Sagot [France] ; Georges Antoniadis [France] ; Ciara R. Wigham [France] ; Linda Hriba [France] ; Julien Longhi [France] ; Djamé Seddah [France]

Source :

RBID : Hal:halshs-00953507

English descriptors

Abstract

The CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres</title>
<author>
<name sortKey="Chanier, Thierry" sort="Chanier, Thierry" uniqKey="Chanier T" first="Thierry" last="Chanier">Thierry Chanier</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-229" status="VALID">
<orgName>Laboratoire de Recherche sur le Langage</orgName>
<orgName type="acronym">LRL</orgName>
<desc>
<address>
<addrLine>Maison de la Recherche - 4 rue Ledru - 63057 Clermont-Fd Cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://lrlweb.univ-bpclermont.fr</ref>
</desc>
<listRelation>
<relation name="EA999" active="#struct-205618" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA999" active="#struct-205618" type="direct">
<org type="institution" xml:id="struct-205618" status="VALID">
<orgName>Université Blaise Pascal - Clermont-Ferrand 2</orgName>
<orgName type="acronym">UBP</orgName>
<desc>
<address>
<addrLine>34, avenue Carnot - BP 185 - 63006 Clermont-Ferrand cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-bpclermont.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Poudat, Celine" sort="Poudat, Celine" uniqKey="Poudat C" first="Céline" last="Poudat">Céline Poudat</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-24509" status="VALID">
<orgName>Lexiques, Dictionnaires, Informatique</orgName>
<orgName type="acronym">LDI</orgName>
<desc>
<address>
<addrLine>UFR Lettres, Sciences de l'Homme et des Sociétés, Université Paris 13, 99 avenue Jean-Baptiste Clément, F-93430, Villetaneuse</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www-ldi.univ-paris13.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300305" type="direct"></relation>
<relation active="#struct-303141" type="direct"></relation>
<relation active="#struct-303171" type="direct"></relation>
<relation name="UMR7187" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300305" type="direct">
<org type="institution" xml:id="struct-300305" status="VALID">
<orgName>Université de Cergy Pontoise</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303141" type="direct">
<org type="institution" xml:id="struct-303141" status="VALID">
<orgName>Université Paris 13</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303171" type="direct">
<org type="institution" xml:id="struct-303171" status="VALID">
<orgName>Université Sorbonne Paris Cité</orgName>
<orgName type="acronym">USPC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.sorbonne-paris-cite.fr/fr</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7187" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Paris</settlement>
<region type="region" nuts="2">Île-de-France</region>
</placeName>
<orgName type="university">Université Paris 13</orgName>
</affiliation>
</author>
<author>
<name sortKey="Sagot, Benoit" sort="Sagot, Benoit" uniqKey="Sagot B" first="Benoit" last="Sagot">Benoit Sagot</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-54505" status="OLD">
<idno type="RNSR">200818336A</idno>
<orgName>Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing</orgName>
<orgName type="acronym">ALPAGE</orgName>
<date type="end">2016-01-31</date>
<desc>
<address>
<addrLine>Université Paris Diderot, Bât. Olympe de Gouges, case postale 7003, 75205 Paris cedex 13 - INRIA Rocquencourt</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/alpage</ref>
</desc>
<listRelation>
<relation active="#struct-86790" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300301" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-86790" type="direct">
<org type="laboratory" xml:id="struct-86790" status="VALID">
<idno type="RNSR">196718247G</idno>
<orgName>Inria Paris-Rocquencourt</orgName>
<desc>
<address>
<addrLine>INRIA Rocquencourt : Domaine de Voluceau, Rocquencourt B.P. 105 78153 le Chesnay Cedex / INRIA Paris - 23 avenue d'Italie 75013 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre/paris-rocquencourt</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300301" type="direct">
<org type="institution" xml:id="struct-300301" status="VALID">
<orgName>Université Paris Diderot - Paris 7</orgName>
<orgName type="acronym">UP7</orgName>
<desc>
<address>
<addrLine>5 rue Thomas-Mann - 75205 Paris cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris-diderot.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Antoniadis, Georges" sort="Antoniadis, Georges" uniqKey="Antoniadis G" first="Georges" last="Antoniadis">Georges Antoniadis</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-26828" status="VALID">
<orgName>LInguistique et DIdactique des Langues Étrangères et Maternelles</orgName>
<orgName type="acronym">LIDILEM</orgName>
<desc>
<address>
<addrLine>UFR des Sciences du Langage - BP 25 - 38040 Grenoble cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://lidilem.u-grenoble3.fr/</ref>
</desc>
<listRelation>
<relation name="EA609" active="#struct-5485" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA609" active="#struct-5485" type="direct">
<org type="institution" xml:id="struct-5485" status="OLD">
<idno type="IdRef">026404125</idno>
<orgName>Université Stendhal - Grenoble 3</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 25 38040 Grenoble Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.u-grenoble3.fr/stendhal/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Wigham, Ciara R" sort="Wigham, Ciara R" uniqKey="Wigham C" first="Ciara R." last="Wigham">Ciara R. Wigham</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-51028" status="VALID">
<idno type="IdRef">080671594</idno>
<idno type="RNSR">200311862K</idno>
<orgName>Interactions, Corpus, Apprentissages, Représentations</orgName>
<orgName type="acronym">ICAR</orgName>
<desc>
<address>
<addrLine>5, av Pierre Mendès-France 69676 BRON CEDEX</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://icar.univ-lyon2.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-6818" type="direct"></relation>
<relation active="#struct-33804" type="direct"></relation>
<relation active="#struct-300042" type="direct"></relation>
<relation name="UMR5191" active="#struct-441569" type="direct"></relation>
<relation active="#struct-303652" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-6818" type="direct">
<org type="institution" xml:id="struct-6818" status="VALID">
<idno type="IdRef">149154992</idno>
<orgName>École normale supérieure - Lyon</orgName>
<orgName type="acronym">ENS Lyon</orgName>
<desc>
<address>
<addrLine>15 parvis René Descartes - BP 7000 - 69342 Lyon Cedex 07</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ens-lyon.eu/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-33804" type="direct">
<org type="institution" xml:id="struct-33804" status="VALID">
<orgName>Université Lumière - Lyon 2</orgName>
<orgName type="acronym">UL2</orgName>
<desc>
<address>
<addrLine>86, rue Pasteur - 69007 Lyon</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lyon2.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300042" type="direct">
<org type="institution" xml:id="struct-300042" status="VALID">
<orgName>INRP</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5191" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303652" type="direct">
<org type="institution" xml:id="struct-303652" status="OLD">
<orgName>Ecole Normale Supérieure Lettres et Sciences Humaines</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Hriba, Linda" sort="Hriba, Linda" uniqKey="Hriba L" first="Linda" last="Hriba">Linda Hriba</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-24509" status="VALID">
<orgName>Lexiques, Dictionnaires, Informatique</orgName>
<orgName type="acronym">LDI</orgName>
<desc>
<address>
<addrLine>UFR Lettres, Sciences de l'Homme et des Sociétés, Université Paris 13, 99 avenue Jean-Baptiste Clément, F-93430, Villetaneuse</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www-ldi.univ-paris13.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300305" type="direct"></relation>
<relation active="#struct-303141" type="direct"></relation>
<relation active="#struct-303171" type="direct"></relation>
<relation name="UMR7187" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300305" type="direct">
<org type="institution" xml:id="struct-300305" status="VALID">
<orgName>Université de Cergy Pontoise</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303141" type="direct">
<org type="institution" xml:id="struct-303141" status="VALID">
<orgName>Université Paris 13</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303171" type="direct">
<org type="institution" xml:id="struct-303171" status="VALID">
<orgName>Université Sorbonne Paris Cité</orgName>
<orgName type="acronym">USPC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.sorbonne-paris-cite.fr/fr</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7187" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Paris</settlement>
<region type="region" nuts="2">Île-de-France</region>
</placeName>
<orgName type="university">Université Paris 13</orgName>
</affiliation>
</author>
<author>
<name sortKey="Longhi, Julien" sort="Longhi, Julien" uniqKey="Longhi J" first="Julien" last="Longhi">Julien Longhi</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-156391" status="VALID">
<orgName>Centre de recherche textes et francophonies</orgName>
<orgName type="acronym">CRTF</orgName>
<desc>
<address>
<addrLine>Université de Cergy-Pontoise - 33, boulevard du Port - 95011 Cergy-Pontoise cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation name="EA1392" active="#struct-300305" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA1392" active="#struct-300305" type="direct">
<org type="institution" xml:id="struct-300305" status="VALID">
<orgName>Université de Cergy Pontoise</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Seddah, Djame" sort="Seddah, Djame" uniqKey="Seddah D" first="Djamé" last="Seddah">Djamé Seddah</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-54505" status="OLD">
<idno type="RNSR">200818336A</idno>
<orgName>Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing</orgName>
<orgName type="acronym">ALPAGE</orgName>
<date type="end">2016-01-31</date>
<desc>
<address>
<addrLine>Université Paris Diderot, Bât. Olympe de Gouges, case postale 7003, 75205 Paris cedex 13 - INRIA Rocquencourt</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/alpage</ref>
</desc>
<listRelation>
<relation active="#struct-86790" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300301" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-86790" type="direct">
<org type="laboratory" xml:id="struct-86790" status="VALID">
<idno type="RNSR">196718247G</idno>
<orgName>Inria Paris-Rocquencourt</orgName>
<desc>
<address>
<addrLine>INRIA Rocquencourt : Domaine de Voluceau, Rocquencourt B.P. 105 78153 le Chesnay Cedex / INRIA Paris - 23 avenue d'Italie 75013 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre/paris-rocquencourt</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300301" type="direct">
<org type="institution" xml:id="struct-300301" status="VALID">
<orgName>Université Paris Diderot - Paris 7</orgName>
<orgName type="acronym">UP7</orgName>
<desc>
<address>
<addrLine>5 rue Thomas-Mann - 75205 Paris cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris-diderot.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:halshs-00953507</idno>
<idno type="halId">halshs-00953507</idno>
<idno type="halUri">https://halshs.archives-ouvertes.fr/halshs-00953507</idno>
<idno type="url">https://halshs.archives-ouvertes.fr/halshs-00953507</idno>
<date when="2014">2014</date>
<idno type="wicri:Area/Hal/Corpus">000019</idno>
<idno type="wicri:Area/Hal/Curation">000019</idno>
<idno type="wicri:Area/Hal/Checkpoint">000011</idno>
<idno type="wicri:explorRef" wicri:stream="Hal" wicri:step="Checkpoint">000011</idno>
<idno type="wicri:Area/Main/Merge">000011</idno>
<idno type="wicri:Area/Main/Curation">000011</idno>
<idno type="wicri:Area/Main/Exploration">000011</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres</title>
<author>
<name sortKey="Chanier, Thierry" sort="Chanier, Thierry" uniqKey="Chanier T" first="Thierry" last="Chanier">Thierry Chanier</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-229" status="VALID">
<orgName>Laboratoire de Recherche sur le Langage</orgName>
<orgName type="acronym">LRL</orgName>
<desc>
<address>
<addrLine>Maison de la Recherche - 4 rue Ledru - 63057 Clermont-Fd Cedex 1</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://lrlweb.univ-bpclermont.fr</ref>
</desc>
<listRelation>
<relation name="EA999" active="#struct-205618" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA999" active="#struct-205618" type="direct">
<org type="institution" xml:id="struct-205618" status="VALID">
<orgName>Université Blaise Pascal - Clermont-Ferrand 2</orgName>
<orgName type="acronym">UBP</orgName>
<desc>
<address>
<addrLine>34, avenue Carnot - BP 185 - 63006 Clermont-Ferrand cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-bpclermont.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Poudat, Celine" sort="Poudat, Celine" uniqKey="Poudat C" first="Céline" last="Poudat">Céline Poudat</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-24509" status="VALID">
<orgName>Lexiques, Dictionnaires, Informatique</orgName>
<orgName type="acronym">LDI</orgName>
<desc>
<address>
<addrLine>UFR Lettres, Sciences de l'Homme et des Sociétés, Université Paris 13, 99 avenue Jean-Baptiste Clément, F-93430, Villetaneuse</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www-ldi.univ-paris13.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300305" type="direct"></relation>
<relation active="#struct-303141" type="direct"></relation>
<relation active="#struct-303171" type="direct"></relation>
<relation name="UMR7187" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300305" type="direct">
<org type="institution" xml:id="struct-300305" status="VALID">
<orgName>Université de Cergy Pontoise</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303141" type="direct">
<org type="institution" xml:id="struct-303141" status="VALID">
<orgName>Université Paris 13</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303171" type="direct">
<org type="institution" xml:id="struct-303171" status="VALID">
<orgName>Université Sorbonne Paris Cité</orgName>
<orgName type="acronym">USPC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.sorbonne-paris-cite.fr/fr</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7187" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Paris</settlement>
<region type="region" nuts="2">Île-de-France</region>
</placeName>
<orgName type="university">Université Paris 13</orgName>
</affiliation>
</author>
<author>
<name sortKey="Sagot, Benoit" sort="Sagot, Benoit" uniqKey="Sagot B" first="Benoit" last="Sagot">Benoit Sagot</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-54505" status="OLD">
<idno type="RNSR">200818336A</idno>
<orgName>Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing</orgName>
<orgName type="acronym">ALPAGE</orgName>
<date type="end">2016-01-31</date>
<desc>
<address>
<addrLine>Université Paris Diderot, Bât. Olympe de Gouges, case postale 7003, 75205 Paris cedex 13 - INRIA Rocquencourt</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/alpage</ref>
</desc>
<listRelation>
<relation active="#struct-86790" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300301" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-86790" type="direct">
<org type="laboratory" xml:id="struct-86790" status="VALID">
<idno type="RNSR">196718247G</idno>
<orgName>Inria Paris-Rocquencourt</orgName>
<desc>
<address>
<addrLine>INRIA Rocquencourt : Domaine de Voluceau, Rocquencourt B.P. 105 78153 le Chesnay Cedex / INRIA Paris - 23 avenue d'Italie 75013 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre/paris-rocquencourt</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300301" type="direct">
<org type="institution" xml:id="struct-300301" status="VALID">
<orgName>Université Paris Diderot - Paris 7</orgName>
<orgName type="acronym">UP7</orgName>
<desc>
<address>
<addrLine>5 rue Thomas-Mann - 75205 Paris cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris-diderot.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Antoniadis, Georges" sort="Antoniadis, Georges" uniqKey="Antoniadis G" first="Georges" last="Antoniadis">Georges Antoniadis</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-26828" status="VALID">
<orgName>LInguistique et DIdactique des Langues Étrangères et Maternelles</orgName>
<orgName type="acronym">LIDILEM</orgName>
<desc>
<address>
<addrLine>UFR des Sciences du Langage - BP 25 - 38040 Grenoble cedex 9</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://lidilem.u-grenoble3.fr/</ref>
</desc>
<listRelation>
<relation name="EA609" active="#struct-5485" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA609" active="#struct-5485" type="direct">
<org type="institution" xml:id="struct-5485" status="OLD">
<idno type="IdRef">026404125</idno>
<orgName>Université Stendhal - Grenoble 3</orgName>
<date type="end">2015-12-31</date>
<desc>
<address>
<addrLine>BP 25 38040 Grenoble Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.u-grenoble3.fr/stendhal/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Wigham, Ciara R" sort="Wigham, Ciara R" uniqKey="Wigham C" first="Ciara R." last="Wigham">Ciara R. Wigham</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-51028" status="VALID">
<idno type="IdRef">080671594</idno>
<idno type="RNSR">200311862K</idno>
<orgName>Interactions, Corpus, Apprentissages, Représentations</orgName>
<orgName type="acronym">ICAR</orgName>
<desc>
<address>
<addrLine>5, av Pierre Mendès-France 69676 BRON CEDEX</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://icar.univ-lyon2.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-6818" type="direct"></relation>
<relation active="#struct-33804" type="direct"></relation>
<relation active="#struct-300042" type="direct"></relation>
<relation name="UMR5191" active="#struct-441569" type="direct"></relation>
<relation active="#struct-303652" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-6818" type="direct">
<org type="institution" xml:id="struct-6818" status="VALID">
<idno type="IdRef">149154992</idno>
<orgName>École normale supérieure - Lyon</orgName>
<orgName type="acronym">ENS Lyon</orgName>
<desc>
<address>
<addrLine>15 parvis René Descartes - BP 7000 - 69342 Lyon Cedex 07</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.ens-lyon.eu/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-33804" type="direct">
<org type="institution" xml:id="struct-33804" status="VALID">
<orgName>Université Lumière - Lyon 2</orgName>
<orgName type="acronym">UL2</orgName>
<desc>
<address>
<addrLine>86, rue Pasteur - 69007 Lyon</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lyon2.fr</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300042" type="direct">
<org type="institution" xml:id="struct-300042" status="VALID">
<orgName>INRP</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5191" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303652" type="direct">
<org type="institution" xml:id="struct-303652" status="OLD">
<orgName>Ecole Normale Supérieure Lettres et Sciences Humaines</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Hriba, Linda" sort="Hriba, Linda" uniqKey="Hriba L" first="Linda" last="Hriba">Linda Hriba</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-24509" status="VALID">
<orgName>Lexiques, Dictionnaires, Informatique</orgName>
<orgName type="acronym">LDI</orgName>
<desc>
<address>
<addrLine>UFR Lettres, Sciences de l'Homme et des Sociétés, Université Paris 13, 99 avenue Jean-Baptiste Clément, F-93430, Villetaneuse</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www-ldi.univ-paris13.fr/</ref>
</desc>
<listRelation>
<relation active="#struct-300305" type="direct"></relation>
<relation active="#struct-303141" type="direct"></relation>
<relation active="#struct-303171" type="direct"></relation>
<relation name="UMR7187" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300305" type="direct">
<org type="institution" xml:id="struct-300305" status="VALID">
<orgName>Université de Cergy Pontoise</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303141" type="direct">
<org type="institution" xml:id="struct-303141" status="VALID">
<orgName>Université Paris 13</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-303171" type="direct">
<org type="institution" xml:id="struct-303171" status="VALID">
<orgName>Université Sorbonne Paris Cité</orgName>
<orgName type="acronym">USPC</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.sorbonne-paris-cite.fr/fr</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7187" active="#struct-441569" type="direct">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Paris</settlement>
<region type="region" nuts="2">Île-de-France</region>
</placeName>
<orgName type="university">Université Paris 13</orgName>
</affiliation>
</author>
<author>
<name sortKey="Longhi, Julien" sort="Longhi, Julien" uniqKey="Longhi J" first="Julien" last="Longhi">Julien Longhi</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-156391" status="VALID">
<orgName>Centre de recherche textes et francophonies</orgName>
<orgName type="acronym">CRTF</orgName>
<desc>
<address>
<addrLine>Université de Cergy-Pontoise - 33, boulevard du Port - 95011 Cergy-Pontoise cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation name="EA1392" active="#struct-300305" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA1392" active="#struct-300305" type="direct">
<org type="institution" xml:id="struct-300305" status="VALID">
<orgName>Université de Cergy Pontoise</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Seddah, Djame" sort="Seddah, Djame" uniqKey="Seddah D" first="Djamé" last="Seddah">Djamé Seddah</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-54505" status="OLD">
<idno type="RNSR">200818336A</idno>
<orgName>Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing</orgName>
<orgName type="acronym">ALPAGE</orgName>
<date type="end">2016-01-31</date>
<desc>
<address>
<addrLine>Université Paris Diderot, Bât. Olympe de Gouges, case postale 7003, 75205 Paris cedex 13 - INRIA Rocquencourt</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/equipes/alpage</ref>
</desc>
<listRelation>
<relation active="#struct-86790" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300301" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-86790" type="direct">
<org type="laboratory" xml:id="struct-86790" status="VALID">
<idno type="RNSR">196718247G</idno>
<orgName>Inria Paris-Rocquencourt</orgName>
<desc>
<address>
<addrLine>INRIA Rocquencourt : Domaine de Voluceau, Rocquencourt B.P. 105 78153 le Chesnay Cedex / INRIA Paris - 23 avenue d'Italie 75013 Paris</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/centre/paris-rocquencourt</ref>
</desc>
<listRelation>
<relation active="#struct-300009" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300301" type="direct">
<org type="institution" xml:id="struct-300301" status="VALID">
<orgName>Université Paris Diderot - Paris 7</orgName>
<orgName type="acronym">UP7</orgName>
<desc>
<address>
<addrLine>5 rue Thomas-Mann - 75205 Paris cedex 13</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris-diderot.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="en">
<term>CMC</term>
<term>CoMeRe</term>
<term>Computer Mediated Communication</term>
<term>corpus</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The CoMeRe project aims to build a kernel corpus of different Computer-Mediated Com-munication (CMC) genres with interactions in French as the main language, by assembling interactions stemming from networks such as the Internet or telecommunication, as well as mono and multimodal, synchronous and asynchronous communications. Corpora are assem-bled using a standard, thanks to the TEI (Text Encoding Initiative) format. This implies extending, through a European endeavor, the TEI model of text, in order to encompass the richest and the more complex CMC genres. This paper presents the Interaction Space model. We explain how this model has been encoded within the TEI corpus header and body. The model is then instantiated through the first four corpora we have processed: three corpora where interactions occurred in single-modality environments (text chat, or SMS systems) and a fourth corpus where text chat, email and forum modalities were used simultaneously. The CoMeRe project has two main research perspectives: Discourse Analysis, only alluded to in this paper, and the linguistic study of idiolects occurring in different CMC genres. As NLP algorithms are an indispensable prerequisite for such research, we present our motiva-tions for applying an automatic annotation process to the CoMeRe corpora. Our wish to guarantee generic annotations meant we did not consider any processing beyond morphosyn-tactic labelling, but prioritized the automatic annotation of any freely variant elements within the corpora. We then turn to decisions made concerning which annotations to make for which units and describe the processing pipeline for adding these. All CoMeRe corpora are verified, thanks to a staged quality control process, designed to allow corpora to move from one project phase to the next. Public release of the CoMeRe corpora is a short-term goal: corpora will be integrated into the forthcoming French National Reference Corpus, and disseminated through the national linguistic infrastructure ORTOLANG. We, therefore, highlight issues and decisions made concerning the OpenData perspective.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Île-de-France</li>
</region>
<settlement>
<li>Paris</li>
</settlement>
<orgName>
<li>Université Paris 13</li>
</orgName>
</list>
<tree>
<country name="France">
<noRegion>
<name sortKey="Chanier, Thierry" sort="Chanier, Thierry" uniqKey="Chanier T" first="Thierry" last="Chanier">Thierry Chanier</name>
</noRegion>
<name sortKey="Antoniadis, Georges" sort="Antoniadis, Georges" uniqKey="Antoniadis G" first="Georges" last="Antoniadis">Georges Antoniadis</name>
<name sortKey="Hriba, Linda" sort="Hriba, Linda" uniqKey="Hriba L" first="Linda" last="Hriba">Linda Hriba</name>
<name sortKey="Longhi, Julien" sort="Longhi, Julien" uniqKey="Longhi J" first="Julien" last="Longhi">Julien Longhi</name>
<name sortKey="Poudat, Celine" sort="Poudat, Celine" uniqKey="Poudat C" first="Céline" last="Poudat">Céline Poudat</name>
<name sortKey="Sagot, Benoit" sort="Sagot, Benoit" uniqKey="Sagot B" first="Benoit" last="Sagot">Benoit Sagot</name>
<name sortKey="Seddah, Djame" sort="Seddah, Djame" uniqKey="Seddah D" first="Djamé" last="Seddah">Djamé Seddah</name>
<name sortKey="Wigham, Ciara R" sort="Wigham, Ciara R" uniqKey="Wigham C" first="Ciara R." last="Wigham">Ciara R. Wigham</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000011 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000011 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:halshs-00953507
   |texte=   The CoMeRe corpus for French: structuring and annotating heterogeneous CMC genres
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024